136 research outputs found
Gated networks: an inventory
Gated networks are networks that contain gating connections, in which the
outputs of at least two neurons are multiplied. Initially, gated networks were
used to learn relationships between two input sources, such as pixels from two
images. More recently, they have been applied to learning activity recognition
or multi-modal representations. The aims of this paper are threefold: 1) to
explain the basic computations in gated networks to the non-expert, while
adopting a standpoint that insists on their symmetric nature. 2) to serve as a
quick reference guide to the recent literature, by providing an inventory of
applications of these networks, as well as recent extensions to the basic
architecture. 3) to suggest future research directions and applications.Comment: Unpublished manuscript, 17 page
Policy Search in Continuous Action Domains: an Overview
Continuous action policy search is currently the focus of intensive research,
driven both by the recent success of deep reinforcement learning algorithms and
the emergence of competitors based on evolutionary algorithms. In this paper,
we present a broad survey of policy search methods, providing a unified
perspective on very different approaches, including also Bayesian Optimization
and directed exploration methods. The main message of this overview is in the
relationship between the families of methods, but we also outline some factors
underlying sample efficiency properties of the various approaches.Comment: Accepted in the Neural Networks Journal (Volume 113, May 2019
Path Integral Policy Improvement with Covariance Matrix Adaptation
There has been a recent focus in reinforcement learning on addressing
continuous state and action problems by optimizing parameterized policies. PI2
is a recent example of this approach. It combines a derivation from first
principles of stochastic optimal control with tools from statistical estimation
theory. In this paper, we consider PI2 as a member of the wider family of
methods which share the concept of probability-weighted averaging to
iteratively update parameters to optimize a cost function. We compare PI2 to
other members of the same family - Cross-Entropy Methods and CMAES - at the
conceptual level and in terms of performance. The comparison suggests the
derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy
Improvement with Covariance Matrix Adaptation". PI2-CMA's main advantage is
that it determines the magnitude of the exploration noise automatically.Comment: ICML201
Comparing Semi-Parametric Model Learning Algorithms for Dynamic Model Estimation in Robotics
Physical modeling of robotic system behavior is the foundation for
controlling many robotic mechanisms to a satisfactory degree. Mechanisms are
also typically designed in a way that good model accuracy can be achieved with
relatively simple models and model identification strategies. If the modeling
accuracy using physically based models is not enough or too complex, model-free
methods based on machine learning techniques can help. Of particular interest
to us was therefore the question to what degree semi-parametric modeling
techniques, meaning combinations of physical models with machine learning,
increase the modeling accuracy of inverse dynamics models which are typically
used in robot control. To this end, we evaluated semi-parametric Gaussian
process regression and a novel model-based neural network architecture, and
compared their modeling accuracy to a series of naive semi-parametric,
parametric-only and non-parametric-only regression methods. The comparison has
been carried out on three test scenarios, one involving a real test-bed and two
involving simulated scenarios, with the most complex scenario targeting the
modeling a simulated robot's inverse dynamics model. We found that in all but
one case, semi-parametric Gaussian process regression yields the most accurate
models, also with little tuning required for the training procedure
Smooth Exploration for Robotic Reinforcement Learning
Reinforcement learning (RL) enables robots to learn skills from interactions
with the real world. In practice, the unstructured step-based exploration used
in Deep RL -- often very successful in simulation -- leads to jerky motion
patterns on real robots. Consequences of the resulting shaky behavior are poor
exploration, or even damage to the robot. We address these issues by adapting
state-dependent exploration (SDE) to current Deep RL algorithms. To enable this
adaptation, we propose two extensions to the original SDE, using more general
features and re-sampling the noise periodically, which leads to a new
exploration method generalized state-dependent exploration (gSDE). We evaluate
gSDE both in simulation, on PyBullet continuous control tasks, and directly on
three different real robots: a tendon-driven elastic robot, a quadruped and an
RC car. The noise sampling interval of gSDE permits to have a compromise
between performance and smoothness, which allows training directly on the real
robots without loss of performance. The code is available at
https://github.com/DLR-RM/stable-baselines3.Comment: Code: https://github.com/DLR-RM/stable-baselines3/ Training scripts:
https://github.com/DLR-RM/rl-baselines3-zoo
Many regression algorithms, one unified model — A review
International audienceRegression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. The history of regression is closely related to the history of artificial neural networks since the seminal work of Rosenblatt (1958). The aims of this paper are to provide an overview of many regression algorithms, and to demonstrate how the function representation whose parameters they regress fall into two classes: a weighted sum of basis functions, or a mixture of linear models. Furthermore, we show that the former is a special case of the latter. Our ambition is thus to provide a deep understanding of the relationship between these algorithms, that, despite being derived from very different principles, use a function representation that can be captured within one unified model. Finally, step-by-step derivations of the algorithms from first principles and visualizations of their inner workings allow this article to be used as a tutorial for those new to regression
Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct
National audienceLa résolution de problèmes à états et actions continus par l'optimisation de politiques paramétriques est un sujet d'intérêt récent en apprentissage par renforcement. L'algorithme PI2 est un exemple de cette approche, qui bénéficie de fondements mathématiques solides tirés de la commande stochastique optimale et des outils de la théorie de l'estimation statistique. Dans cet article, nous considérons PI2 en tant que membre de la famille plus vaste des méthodes qui partagent le concept de moyenne pondérée par les probabilités pour mettre à jour itérativement des paramètres afin d'optimiser une fonction de coût. Nous comparons PI2 à d'autres membres de la même famille - la " méthode d'entropie croisée " et CMA-ES 1 - au niveau conceptuel et en termes de performance. La comparaison débouche sur la dérivation d'un nouvel algorithme que nous appelons PI2-CMA pour " Path Integral Policy Improvement with Covariance Matrix Adaptation ". Le principal avantage de PI2-CMA est qu'il détermine l'amplitude du bruit d'exploration automatiquement
Fault-Tolerant Six-DoF Pose Estimation for Tendon-Driven Continuum Mechanisms
We propose a fault-tolerant estimation technique for the six-DoF pose of a tendon-driven continuum mechanisms using machine learning. In contrast to previous estimation techniques, no deformation model is required, and the pose prediction is rather performed with polynomial regression. As only a few datapoints are required for the regression, several estimators are trained with structured occlusions of the available sensor information, and clustered into ensembles based on the available sensors. By computing the variance of one ensemble, the uncertainty in the prediction is monitored and, if the variance is above a threshold, sensor loss is detected and handled. Experiments on the humanoid neck of the DLR robot DAVID, demonstrate that the accuracy of the predicted pose is significantly improved, and a reliable prediction can still be performed using only 3 out of 8 sensors
Sensorimotor impairment and haptic support in microgravity
Future space missions envisage human operators teleoperating robotic systems from orbital spacecraft. A potential risk for such missions is the observation that sensorimotor performance deteriorates during spaceflight. This article describes an experiment on sensorimotor performance in two-dimensional manual tracking during different stages of a space mission. We investigated whether there are optimal haptic settings of the human-machine interface for microgravity conditions. Two empirical studies using the same task paradigm with a force feedback joystick with different haptic settings (no haptics, four spring stiffnesses, two motion dampings, three masses) are presented in this paper. (1) A terrestrial control study (N = 20 subjects) with five experimental sessions to explore potential learning effects and interactions with haptic settings. (2) A space experiment (N = 3 cosmonauts) with a pre-mission, three mission sessions on board the ISS (2, 4, and 6 weeks in space), and a post-mission session. Results provide evidence that distorted proprioception significantly affects motion smoothness in the early phase of adaptation to microgravity, while the magnitude of this effect was moderated by cosmonauts' sensorimotor capabilities. Moreover, this sensorimotor impairment can be compensated by providing subtle haptic cues. Specifically, low damping improved tracking smoothness for both motion directions (sagittal and transverse motion plane) and low stiffness improved performance in the transverse motion plane
- …